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As part of fMRI data analysis, the pyhrf package provides a set of tools for addressing 
the two main issues involved in intra-subject fMRI data analysis: (1) the localization of 
cerebral regions that elicit evoked activity and (2) the estimation of activation dynamics 
also known as Hemodynamic Response Function (HRF) recovery. To tackle these two 
problems, pyhrf implements the Joint Detection-Estimation framework (JDE) which 
recovers parcel-level HRFs and embeds an adaptive spatio-temporal regularization scheme 
of activation maps. With respect to the sole detection issue (1), the classical voxelwise 
GLM procedure is also available through nipy, whereas Finite Impulse Response (FIR) 
and temporally regularized FIR models are concerned with HRF estimation (2) and are 
specifically implemented in pyhrf. Several parcellation tools are also integrated such 
as spatial and functional clustering. Parcellations may be used for spatial averaging prior 
to FIR/RFIR analysis or to specify the spatial support of the HRF estimates in the JDE 
approach. These analysis procedures can be applied either to volume-based data sets or 
to data projected onto the cortical surface. For validation purpose, this package is shipped 
with artificial and real fMRI data sets, which are used in this paper to compare the outcome 
of the different available approaches. The artificial fMRI data generator is also described 
to illustrate how to simulate different activation configurations, HRF shapes or nuisance 
components. To cope with the high computational needs for inference, pyhrf handles 
distributing computing by exploiting cluster units as well as multi-core machines. Finally, a 
dedicated viewer is presented, which handles n-dimensional images and provides suitable 
features to explore whole brain hemodynamics (time series, maps, ROI mask overlay). 
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1. INTRODUCTION 

As Magnetic Resonance Imaging (MRI) is a growing imaging 
modality in neuroscience, the need for powerful tools to explore 
the increasing amount of data is more and more significant. 
This data growth is quantitative as cohort sizes are getting bigger 
through the development of international multi-center projects 
like the Human Brain Project (Kandel et al, 2013) but also qual- 
itative as high field magnets become more and more available 
(Duyn and Koretsky, 2011). Functional MRI (fMRI), in partic- 
ular, benefits from these improvements. The experimenter has 
access to finer spatial (~1 mm) and temporal (~1 s) resolutions 
and also higher signal-to-noise ratio (SNR). In particular, the 
higher temporal resolution combined with higher SNR allows 
for a better recovery of dynamic processes so that we are no 
longer limited to only static mappings of cerebral activity. In 
this context, pyhrf aims to extract dynamic features from 
fMRI data, especially the Blood Oxygenated Level Dependent 
(BOLD) modality (Ogawa et al, 1990). The observed BOLD sig- 
nal is an indirect measure of the neural activity via the oxygen 



variation induced by the neuro-vascular coupling. Therefore, 
analysis methods have to formalize a hemodynamic model in 
order to make inference on neural processes. However, even if 
BOLD variations are known to correlate with neural activity, it 
is difficult to disentangle the dynamics of neural and the vascular 
components. As the employed methodology mainly uses linear 
systems, dynamic processes are summarized within the so-called 
Hemodynamic Response Function (HRF), which is the impulse 
response that links neuronal activity to the fMRI signal. In fact, 
the package offers various tools to analyze evoked fMRI data 
ranging from spatial mappings such as those provided by the 
General Linear Model (GLM) framework (Friston et al., 1995) to 
finer hemodynamics models as provided by the joint detection- 
estimation (JDE) approach described in Makni et al. (2005, 2008) 
and Vincent et al. (2010). Through a bilinear and time-invariant 
system, the JDE approach models an unknown HRF at the level 
of a group of voxels (referred to as a parcel in the following) as 
well as voxel- and condition-specific response levels to encode the 
local magnitudes of this response. The HRF is only constrained to 
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be smooth (temporal regularization) and can cover a wide vari- 
ety of shapes. The response levels are spatially regularized within 
each parcel. Hence, the JDE approach is a spatially adaptive GLM 
built on unknown parcel-dependent HRFs with spatio-temporal 
regularization. 

The use of each tool depends on a choice of model which is 
driven by the features required by the experimenter's goal. To 
obtain classical detection results, a GLM based on the canoni- 
cal HRF (and possibly its temporal derivatives) may be sufficient. 
Even if the between-region hemodynamic variability is acknowl- 
edged, the canonical HRF can provide good results in regions 
where it has been calibrated such as temporal and occipital cor- 
tices as studied by Boynton et al. (1996). However, in regions 
involving more complex processes, HRF derivatives or function 
bases may not be enough to capture the hemodynamic fluctua- 
tions allowing activation detection. Hemodynamics delays may 
happen due to varying reaction delays or pathological cases. 
Moreover, if one is interested in studying the dynamic features 
of the response, an explicit HRF estimation is required. The 
main question in this case concerns whether there is a need 
for condition-specific features or not. That is, a need for an 
HRF estimation associated with each experimental condition 
or for a single HRF estimate associated with aU conditions. If 
explicit condition-wise HRFs are required, the best methodologi- 
cal tool to use is the temporally Regularized FIR (RFIR) developed 
in Ciuciu et al. (2003) and Marrelec et al. (2003). Otherwise, 
if variability is only expected across separated and specialized 
regions, the JDE framework is well-suited. Indeed, within a spe- 
cialized region, if only one condition exhibits activity then the 
region-specific HRF can be considered a condition-specific HRF. 
The performance of RFIR models depends nonetheless on the 
number of experimental conditions involved in the paradigm 
because the higher this number, the larger the number of param- 
eters to estimate, and thus the fewer the number of degrees 
of freedom for statistical testing. The model choice thus also 
depends on the experimental paradigm. First, it is worth noticing 
that the use of the JDE formulation is less relevant in the analysis 
of block paradigm data since the signal variability in this case is 
small. The JDE formalism is actually more adapted to fast event- 
related paradigms or to paradigms including many conditions, 
like the localizer paradigm (10 conditions) introduced by Pinel 
et al. (2007) and used hereafter in this paper. The JDE approach 
is also optimally tuned to combined analysis of hemodynamics 
features with the detection of activated brain areas. To summa- 
rize, the JDE model choice provides a fair compromise with the 
possibility for the user to adapt the model to the studied region. 

JDE also delivers interesting and complementary results for 
the sole activation-detection aspect compared with classical GLM. 
Spatial regularization, which is necessary due to the low SNR 
in fMRI, is not enforced in the same way between methods. In 
the GLM, FIR, and RFIR cases, there is no embedded spatial 
regularization within models. Indeed, the data are usually spa- 
tially smoothed with a fixed Gaussian kernel in preprocessing. 
In contrast, JDE incorporates spatial correlation through hidden 
Markov models. The amount of spatial correlation is automat- 
ically tuned and also adaptive across brain regions, therefore 
avoiding any prior invariant smoothing. 



pyhrf is mainly written in python with some C code to 
cope with computationally demanding parts of algorithms. This 
python choice has been made possible thanks to the nipy ' 
project and especially nibabel^ to handle data reading/writing 
in the NIFTI format. 

In terms of package maturity, pyhrf is a research tool which 
has the ambition to target cognitive neuroscientists and clinicians. 
Efforts are made in terms of user-friendliness and the design is a 
trade-off between mutability which is required by methodologi- 
cal research where specifications change frequently and usability 
where user interfaces should be as stable as possible to ease 
external non-developer use cases. 

The rest of the paper is organized as follows. First, meth- 
ods available in the package are presented, comprising parcella- 
tion and detection/estimation analyses. Then, the workflow and 
design of the pyhrf package are detailed. These cover the user 
interface and code snippets for the main analysis treatments, sim- 
ulation framework, distributed computations and data viewer. 
Results illustrate the outcome of geometric and functional par- 
cellations and their impact on detection/estimation treatments. 
Finally, conclusions are drawn and perspectives for future devel- 
opments are indicated. 

2. METHODS 

The are two main kinds of fMRI data analysis methods avail- 
able in pyhrf: (1) parceUation tools that segment the brain into 
disjoint sets of positions and (2) activation detection/HRF esti- 
mation tools that highlight correlations between the input exper- 
imental paradigm and variations in the measured fMRI signal. 
The first kind comprises two spatial parceUation tools: Voronoi- 
based random parceUation, as reviewed by Aurenhammer and 
Klein (2000) and balanced partitioning, developed in Elor and 
Bruckstein (2009). The second kind comprises the GLM intro- 
duced in Friston (1998), the FIR model described in Henson et al. 
(2000), the RFIR model developed in Ciuciu et al. (2003) and 
the JDE approach presented in Vincent et al. (2010) and Risser 
et al. (201 1). The GLM and FIR GLM procedures are provided by 
nipy while RFIR and JDE are originaUy implemented in pyhrf. 
For all these methods, we refer to their respective bibliographi- 
cal references for an extensive presentation of their methodology. 
Nonetheless, the main aspects of these methods are summarized 
in what foUows to allow comparison between them, especiaUy in 
terms of model structure and assumptions. 
After detaUing notations, we introduce detection/estimation 
methods, namely GLM, FIR and RFIR, which require the mea- 
sured fMRI signal and the timing of the experimental paradigm 
as input. After setting the generative model common to all detec- 
tion/estimation methods and a brief comparative overview, each 
approach is presented in more details. Subsequently, parceUation 
methods are presented. Spatial parceUation approaches can be 
applied directly to the input fMRI data and only depend on its 
geometry. Functional parceUation, which is a clustering of GLM 
results, is detailed afterwards. 



www.nipy.org 
www.nipy.org/nibabel 
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2.1. NOTATION 
2.7.7. Conventions 

We denote vectors with bold lower case (e.g., y) and matrices 
with bold upper case letters (e.g., P). A vector is by convention a 
column vector. Scalars are denoted with non-bold lower case let- 
ters (e.g., a). The transpose operation is denoted by Probability 
distribution functions (pdf) are denoted using calligraphic letters 
(e.g., M and Q for the Gaussian and gamma distributions). 

2. 1.2. Data geometry 

As methods can be applied to data defined in the volume or on the 
cortical surface, the generic term "position" will be used in place 
of "voxel" (volume unit) and "node" (surface mesh unit). Position 
indexes are denoted by = 1 : / to indicate a range between 1 
and /. Data is assumed to be masked to only allow positions 
within the brain. / is the total number of positions within the 
functional mask. In addition, when considering parcellated data, 
this functional mask is divided into a set of F parcels denoted 

P = {-Pi Vy,..., Vr}, where Vy is the set of }y = \Vy\ 

position indexes belonging to parcel y . 

2.1.3. Functional data 

We consider the usual case of evoked fMRI data analysis where 
the experimental paradigm comprising M conditions is known. 
The signal measured at each time of repetition (TR) is denoted 
yj = lyj^n}n = i:N where N is the number of scans. Stimulus tim- 
ing onsets for a given experimental condition m = 1 : M are 
encoded by variable x" so that x"' = 1 if a stimulus occurs at 
time f up to a time step A f, else X™ = 0. The time step is such that 
At ^ TR and depends on the actual temporal resolution sought 
by the analysis method. 

2.2. DETECTION/ESTIMATION METHODS 

For ease of comparison, the presentation of all methods is 
immersed in the same formalism where the signal is assumed 
generated by a linear and time-invariant (convolution) system 
with additive noise. We also consider the usual case of taking into 
account a position-specific low frequency drift in the data which 
is a well known fMRI artifact produced by the aliasing of respi- 
ratory and cardiac rhythms into the low frequencies as studied in 
Yan et al. (2009). The generic forward model, reads: 

M 

yj=J2X'"<l>h+Pij + bj, (1) 

m=l 

where: 

• i-* is a fixed orthonormal basis that takes a potential drift 
and any other nuisance effect (e.g., motion parameters) into 
account. The low-frequency drift can classically be either poly- 
nomial with an order up to 5 or cosine with a cut-off of 
0.01 Hz, 

• Ij are the unknown regression weights associated with P, 

• bj is the noise component, 

• 0™ is a "generic" hemodynamic filter of size D. For a typi- 
cal duration of 25 s, D = 25/TR for the GLM and FIR GLM' 



approaches, while D = 25/(^/4) for the RFIR and JDE 
approaches considering a typical oversampling factor of 4. In 
the GLM framework, 0™ can be fixed to the canonical HRF 
or parametric when resorting to function bases and we will 
note R the number of unknown parameters. In non-parametric 
approaches, all HRF coefficients are estimated as in RFIR or 
JDE approaches, 

• X'" is the NxD stimulus occurrence matrix consisting of the 
lagged stimulus covariates for the experimental condition m: 
X'" = [x-,...,xTj with xZ = 

• Hm= 1 ^"^4>h' hence the summation of all stimulus-induced 
signal components which are generated as the convolution 
between the paradigm encoded in X"' and the hemodynamic 
filters 0;;'. 

For the sake of simplicity, multiple-run data are not considered 
here but all implemented methods can handle such data with a 
fixed-effect model (same effect size across runs), a homoscedas- 
tic noise model (one noise variance for all runs) and run-specific 
drift coefficients (see discussion for further extensions). 

To give a first overview of how this generative model structure 
is derived in the different approaches. Table 1 provides a compar- 
ison in terms of regularization, number of unknowns and analysis 
duration. Embedded spatial regularization is only available in the 
JDE procedure, while temporal regularization is available in RFIR 
and JDE (Table 1 — 1st, 2nd rows). In terms of constraints applied 
to the HRF shape (Table 1— 3rd row), the basis set GLM (BS 
GLM) is the most constraining and the shape captured depends 
on the choice of the function basis. In the FIR, RFIR and JDE 
cases, any form of HRF shape can be recovered, provided that 
they are smooth in the case of RFIR and JDE. On Table 1 — 4th 
row, the information on the number of unknowns conveys the 
level of parsimony of a given model. BS GLM, FIR and RFIR 
have increasing model complexity as the number of parameters 
for the HRF increases. In contrast, JDE achieves larger parsimony 
by making the number of unknowns associated with the HRF 
dependent on the number of parcels rather than on the num- 
ber of positions. When computing the ratio between the number 
of unknowns and the number of data points for a typical fMRI 
experiment (Table 1 — 5th row), it appears that JDE is comparable 
to a GLM with derivatives. The RFIR presents the worst situation 
with three times more unknowns than data points. In terms of 
analysis duration (Table 1 — last row), GLM methods are almost 
instantaneous as their inference is straightforward. RFIR relies on 
an iterative scheme to perform unsupervised estimation of the 
amount of temporal regularization and is hence much slower. In 
addition, the implementation of RFIR is done in pure python 
with a main loop over positions which worsen its slow compu- 
tation speed (~30h. for a whole brain analysis)*. Therefore, this 



' Over-sampling could be performed in the case of FIR GLM but is not advis- 
able in terms of estimability since some FIR coefficients may be poorly or 
even not associated with paradigm covariates in matrix JC", depending on 
the paradigm jittering. 

*Note that the RFIR approach with supervised regularization is much faster 
with an analysis duration of 20 min. since the maximum a posteriori estimator 
admits a closed form expression. 
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Table 1 | Comparative overview for all detection/estimation analysis procedures available in pyhrf in terms of model structure and analysis 
duration. 





BS GLM 


FIR GLIV! 


RFIR 


JDE 


Spatial regularization 


Smoothing 


Smoothing 


Smoothing 


Adaptive 


Temporal regularization 


None 


None 


2nd order deriv. 


2nd order deriv. 


HRF shape constraint 


Function basis 


Free 


Smooth 


Smooth 


Number of unknowns 


Jx Rx M 


J X Dx M 


J X M X (D+1) 


2 X J X M + r X (D + 4M+1) 


for the stimulus-induced 


^ ^ R^3 


D% 10 


10 ^ 50 


10 < D ^ 50, 


component 








r w400 


Typical ratio of 


0.23 


0.78 


3.4 


0.16 


unknowns/data 










Analysis duration 


3 min 


5 min 


30 h 


8h 



"2nd order deriv." stands for a penalization on tt)e energy of the HRF which penalizes abrupt shape changes. The number of nuisance parameters was considered 
the same for all models, so that only the modeling of the stimulus-induced component is relevant to assess model parsimony. The ratio "unknowns/data " is given 
for a typical fMRI data analysis with J= 4x l(f,R=3,D=40,M=10,r = 400, and N = 128 (total number of data points: N x J). The analysis duration is for a 
whole brain data treatment on an Intel Core 15 (M480 2.67 Ghz). 



approach is rather hmited to the processing of some regions of 
interest where we expect cerebral activity instead of whole brain 
data analysis. The computation speed of JDE is also slow, but to 
a lesser extent as results can be obtained overnight (~8h. for a 
whole brain analysis) on a single processing unit. All these consid- 
erations on speed have to be nuanced with the access to increasing 
computing power and distributed computations, as will be seen in 
section 3.3. 

2.2. 1. Basis set General Linear Model 

In any position of the brain, the basis set GLM (BS GLM) 
allows for some limited hemodynamic fluctuations by mod- 
eling the hemodynamic filter function in Equation (1) as 
a weighted sum of the fixed canonical HRF denoted he and 
its first and second order derivative h'^, h" as proposed in 
Friston (1998). The generative model, illustrated in Figure lA, 
reads: 

M 

Vj, yj = J2 X'-iP^h, + p'fK + P'TK) + Pij + h (2) 



where fij', p'j , j are the unknown effects associated with 

the nfi stimulus-induced regressors constructed with the fixed 
known vectors h^, h'^, h", respectively. To obtain the classical 
GLM with only the canonical HRF, /Sj and fi" can be set to 
zero for all positions. It is worth noting that this formulation 
of the forward model is equivalent to the classical one where all 
regressors are gathered in the design matrix (noted X) and all 
corresponding effects gathered in a single vector p. Equation (2) 
reads: 

Vj, = X'p^ + b,, (3) 
with: X = [X^K I ■ ■ • I X'"h, I X^h'^ I • • • I X"'h'^ I X^h'^ \ 
■•• \X"'h';\P]' , 



The hemodynamics fluctuations caught by such a model are 
limited to ~ 1 s around the peak of the canonical HRF which 
is at 5 s, see Calhoun et al. (2004). This model is massively 
univariate since every position j is analyzed independently, i.e., 
no correlation between neighboring signals is considered. It 
works well on spatially smoothed data to counter-balance the low 
signal-to-noise ratio, at the expense of blurred activation clusters. 
In the nipy implementation of the GLM, the fitting process can 
be performed using ordinary least squares in the case of white 
Gaussian noise or using Kalman filtering in the case of an AR(l) 
Gaussian noise process. 

2.2.2. FIR GLM and Regularized FIR 

The generative BOLD signal modeling in the FIR context 
encodes all HRF coefficients as unknown variables (illustrated in 
Figure IB): 



M 



(4) 



Here, vector hj" = {hJ'^^fy^_Q ^ represents the unknown HRF 

time course in voxel j which is associated with the rrfi^ experimen- 
tal condition and sampled every Af . In its un-regularized version, 
the FIR model can be expressed in the GLM framework and hence 
its implementation in pyhrf relies on nipy. 

In the case of the Regularized FIR (Ciuciu et al, 2003), 
the problem is placed in the Bayesian formalism in order to 
inject regularity on the recovered HRF coefficients hj. More 
specifically, /ij" ~ 7V(0, v^.n i?) with = (D*D2)"^ where 
D2 is the second-order finite difference matrix enforcing local 
smoothness by penalizing abrupt changes quadratically and v^jm 
is the unknown HRF prior variance which is jointly estimated. 
Computational and inference details are given in Ciuciu et al. 
(2003). 

2.2.3. Joint detection-estimation 

The functional mask of a given subject's brain is a priori divided 
in r functionally homogeneous parcels using methods described 
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FIGURE 1 I Forward models generating the stimulus-induced 
components for the methods available in pyhrf . In all cases, the scheme 
involves two experimental conditions colored in blue and with four 

stimulation events as depicted by vertical bars over the TR-sampled grid. (A) 
General Linear Model (GLM). For a given condition in a given voxel, the stimulus 
event sequence is convolved with the fixed canonical HRF resulting in a fixed 
stimulus-induced regressor. This regressor is then multiplied by an unknown 
effect p'". All the condition-specific regressors are then summed to form the 



final stimulus-induced signal sy. (B) Finite Impulse Response (FIR) Model. In a 
given voxel, the stimulus event-sequence is convolved with an unknown FIR 
vector h.™ for each condition to yield a condition-specific component. All 
components are then summed to form the final stimulus-induced signal Sj. (C) 
Joint Detection-Estimation (JDE). For a given voxel in a given parcel Vy, the 
stimulus sequence gathering all experimental conditions is multiplied by the 
response levels la"). Then, this spike signal is convolved with an unknown 
spatially-invariant HRF /i to form the stimulus-induced signal sy. 



in section 2.3.2. In each parcel 'Py, the shape of the HRF hy is 
assumed constant and the parcel-specific generative model reads: 

M 

V; eVy, yj=Y,afX"'hy+ PI, + bj. (5) 

m = 1 

where yj, X'", P, ij, and bj match the variables introduced 
in section 2.2.1. As shown in Figure IC which illustrates this 
forward model, the a"' variables encode fluctuations that occur 
before the application of the hemodynamic filter. Therefore, 
they are attributed to neural effects and referred to as "Neural 
Response Levels" (NRL). However, this term, which is historical, 
might be misleading as it is difficult to disentangle the contribu- 
tion of the neural and the vascular components from single BOLD 
fMRI data. These variables can be more simply identified to the 
voxel- and condition-specific response amplitudes. 

In contrast to Equation (2) for the GLM forward model, the 
fixed HRF components he and h'^ are replaced by an unknown 
parcel-based HRF hy . Similarly, each unknown NRL a™ embod- 
ies a single magnitude parameter per regressor whereas the GLM 
formulation implies that the magnitude is distributed between 
weights jS™, and P'"^. To summarize, the HRF shape and the 
BOLD response magnitude are coupled in the GLM formulation 
whereas they are decoupled in the JDE formulation. 

In the Bayesian framework, priors are formulated to (1) 
enforce temporal smoothness on the HRF shape to perform esti- 
mation in the same manner as for RFIR and (2) account for spatial 



correlations between NRLs through spatial mixture models to 
perform detection, as described in Vincent et al. (2010). The spa- 
tial regularization factor is jointly estimated and optimized wrt 
parcel topology so as to perform an adaptive spatial smoothing. 
If the experimenter is not interested in the estimation of the HRF, 
then the HRF can be fixed typically to its canonical version in 
the JDE framework, which hence amounts to a spatially adaptive 
GLM. The latter approach enables parcelwise multivariate detec- 
tion of activations with adaptive regularization across parcels. 
As shown at the group-level in Badillo et al. (2013b), this strat- 
egy retrieves more peaked and less extended activation clusters 
compared to classical SPM-like analysis. 

The inference is performed by a stochastic sampling scheme 
where posterior mean estimates are computed from Markov 
Chain Monte Carlo samples. The implementation of the main 
sampling loop is coded in pure python and some intensive sam- 
plers such as the one for the HRF of the NRLs are coded in C. 
Still, the overall JDE procedure is computationally demanding. 
However, since there are as many independent models as parcels, 
the analysis can be split up into parcel-wise parallel analyses (see 
section 3.3). The efficiency of the inference scheme has also 
been improved by resorting to a variational formulation of the 
JDE (Chaari et al, 2013) which is also available in pyhrf. 

2.3. PARCELLATION 

2.3.1. Spatial parcellation 

2.3.1.1. Random Voronoi diagrams. A Voronoi diagram consists 
of a spatial partitioning that builds parcels around predefined 
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control points or seeds. The parcel boundaries are placed so that 
each point of a given parcel is closer to the associated parcel 
seed than any other seed in terms of the Euclidean distance, as 
illustrated in Figure 2 (left). To build a parcellation from such 
partitioning, i.e., to assign each cerebral position to a parcel 
identifier, we do not explicitly require the parcel boundaries. 
Accordingly, there is no need to rely on classical algorithms that 
precisely compute these boundaries. Instead, a given position is 
assigned to the closest seed by resorting to a kd-tree C). 

Random Voronoi parceUations are convenient ways to generate 
samples in the space of sensible parceUations as they produce con- 
vex and compact parcels which are physiologically plausible. They 
have been used in Vincent et al. (2008) to study the sensitivity of 
the parcel-based JDE method. 

2.3.1.2. Balanced partitioning. The goal of balanced partition- 
ing is to build parcels of equal sizes. In the case of a non-regular 
topology such as the brain, there is no morphological tool to 
deterministically solve such partitioning problem which is known 
to be NP-complete as mentioned in Andreev and Racke (2004). 
Hence, the algorithm implemented in pyhrf employs a heuris- 
tic and relies on a multi-agent system that mimics the inflation 
of balloons in a fixed volume (Elor and Bruckstein, 2009), as 
illustrated in Figure 2 (right). 

Balanced partitioning is useful to test the effect of parcel size. 
In pyhrf, balanced partitioning is implemented in pure python 
with a position-wise main loop and is hence rather slow: ~ 1 min 
to split 6000 voxels into 20 parcels. However, this performance is 
sufficient since we only employ balanced partitioning in the case 
of small scale testing data sets or when parcels obtained on real 
data are too big. 

2.3.2. Functional parcellation 

The main goal of functional parcellation is to provide homo- 
geneous parcels with respect to hemodynamics. It is mainly 
motivated by the JDE procedure which assumes that the HRF 
shape is constant within one parcel. To provide such parcellation, 
results obtained from a GLM analysis, or any given task-specific 



^implemented in scipy . spatial . KDTree 



functional maps are clustered using different available algorithms: 
K-means, Ward or spatially-constrained Ward as provided by 
scikit-learn^ To objectively choose an adequate number of 
parcels, theoretical information criteria have been investigated in 
Thyreau et al. (2006): converging evidence for T ~ 400 at a spa- 
tial resolution of 3 x 3 x 3mm^ has been shown for a whole brain 
analysis leading to typical parcel sizes around a few hundred vox- 
els 2.7cm^). As the parcel size is not fixed, some big parcels 
may arise from the parcellation process and may slow down the 
overall parallel processing. To overcome this, the maximum parcel 
size was controlled by splitting too big parcels (larger than 1000 
voxels) according to the balanced partitioning presented in sec- 
tion 2.3.1, which also guarantees the spatial connexity and thus 
properly satisfies the JDE assumptions on the HRF. 

Such "hard clustering" approach yields sharp parcel bound- 
aries that prevent from capturing smooth transitions between 
HRF territories. To avoid wrong boundaries, one can resort to 
over-segmented parceUations (high number of parcels). 

3. PYHRF 

The installation of pyhrf relies on the setup too Is python 
package and requires the following dependencies: numpy ^ 
and scipy* for core algorithms, nibabel for nifti or gifti 
input/outputs, nipy for the GLM implementation and par- 
cellation tools, matplotlib for plots and PyQT4 for GUIs. 
Optional dependencies comprise joblib, scikit-learn 
and soma-workf low. pyhrf is mainly intended for linux- 
based distributions as it has especially been developed under 
Ubuntu. Installation notes and documentation can be found 
online at http://www.pyhrf org. Within the package, the following 
data files' are shipped: 

• two volume-based fMRI data sets (paradigm as CSV files, 
anatomical and BOLD data files). One serves quick test- 
ing while the other is intended for validation/demonstration 
purpose, which is used to generate results in section 4.3, 

• one surface-based fMRI data set mainly intended for testing, 

• several simulation resources in the form of png images 
to provide 2D maps of various activation labels and HRF 
territories. 

The rest of this section is organized as foUows. First, the overall 
workflow of how to use pyhr f is presented, which mainly resorts 
to command lines and some dedicated GUI tools. Second, to go 
further into the package architecture and also to address some fea- 
tures available when scripting, the design of pyhr f is introduced. 
Third, distributed computation is explained in terms of resource 
handling. Finally, the pyhrf viewer is presented with a focus on 
ergonomics. 

3.1. WORKFLOW 

The typical usage of pyhrf relies on shell commands which 
work on XML files. This XML format was chosen for its hier- 
archical organization which suits well the nested nature of the 



^sklearn . cluster .ward 
^ www. numpy. org 
^www.scipy.org 

'There is no special licence on the shipped data sets. 




FIGURE 2 I Illustration of spatial parcellation metliods in pyhrf. Left: 

Voronoi diagram wliere seeds are represented as crosses. Tine red point is 
assigned to the red seed and verifies that its distance to any other seed is 
larger (dl < d2, dl < d3). Right: Balanced partitioning performed by 
patrolling a(ge)nts, image extracted from Elor and Bruckstein (2009). 
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algorithm parametrizations. A dedicated XML editor is pro- 
vided with a PyQt4 graphical interface for a quick editing 
and also a better review of the treatment parameters. When 
such an XML setup file is generated, it defines a default anal- 
ysis that involves a small volume-based real data set shipped 
with the package. This allows for a quick testing of the algo- 
rithms and is also used for demonstration purposes. Here is 
a typical example of shell commands used to perform a JDE 
analysis: 



$ pyhrf_jde_buildcf g 
-o jde.xml 

$ pyhrf _xmledit jde.xml 

$ pyhrf_jde_estim -c 

j de . xml 
$ pyhrf_view *nii 



# generate a 
default XML 
file 

# set up custom 
experiment 

# run the 
analysis 

# view all output 
nifti files 



The "buildcfg" command offers various options to define setup 
items from the command line without having to edit the XML 
file. For example, the paradigm can be loaded from a CSV or a 
SPM.mat file. As for the JDE procedure specifically, the option 
- -vem enables the variational EM approach developed in Chaari 
etal. (2013). 

3.2. DESIGN 

An overview of the static design of the main package compo- 
nents of the package is shown in Figure 3. The class FmriData 
is the within-subject fMRI data representation, for any spatial 
support: on the cortical surface, in the volume, or from a sim- 
ulation. This data representation comprises spatially flat data 
(fMRI time series and parcellation) and a connectivity matrix 
which holds the data topology. At the center of the analy- 
sis component is the Analyzer class that handles parcelwise 



data splitting which is done according to the input data par- 
cellation by default, and also takes care of merging parcel- 
specific outputs at the end of the analysis. This Analyzer 
class is then specialized into various method-specific analyz- 
ers: GLM, RFIR and JDE. Note that the analyzer component 
is decoupled from the data component, as is classically done 
in scientific programming because they do not have the same 
life-cycles (e.g., the same model can be applied to various data 
objects). The FmriTreatment packs the data and analysis 
definitions together and handles distributed computation across 
parcels. 

In the following sub-sections, two specific components 
are further explained: XML parametrization through the 
Xmllnitable class, and the handling of arrays with axis 
semantics through the xndarray class. 

3.2. 1. XML parametrization 

The XML format was chosen for its hierarchical organization 
which suits the nested nature of the algorithm parametriza- 
tions. Indeed, for a JDE analysis, here is an example of such 
different levels: treatment — > analyzer sampler 
hrf sampler. At a given level, different classes may be used 
as there exist, for example, different sampler types depend- 
ing on the type of prior expressed in the JDE model, so 
that we require a seamless parametrization process that avoids 
rewriting code for the building of parameter files each time a 
new model is tested. To do so, any object whose initialization 
has to be exposed in the XML configuration file inherits the 
Xmllnitable class. This system is not a serialization process 
as the whole python object is not dumped in the XML. Only 

the parameters provided to the init function are stored. 

In terms of object life cycle, this process handles object creation 
but is not able to track any subsequent modification. Figure 4 
shows a python code sample that illustrates how the XML file is 
generated from this nested configuration situation. The resulting 
XML file as viewed by the command pyhrf_xmledit is also 
displayed. 



Input Data formats 
nifti I gifti-*- 



csv 



nifti -<- 
SPM.mat-*■ 



loaded from 



Xnnllnitable 




saved/as loaded/from 



FmriData 



parcellation 
functional data 
connectivrEy_matrix 
simulation 
paradigm 



Analyzer 



- split dataO 

- analyse_parcel() 

- merge resultsO 



Parametrization 
XML file 



X 1 




JDEAnalyser 


GLMAnalyser 



Xmllnitable 



applied to 



performed by 
< 



FmriTreatment 



- data 

- analyzer 



runs on 
>• 



Computing 
ressource 
local CPU 
PCs over LAN 
Cluster 




xndarray 


saved as 




Output data 
formats 

nifti 1 gifti 


- data 

- labels 

- domains 


>- 

loaded from 






FIGURE 3 I Static organization of the main components in thie pyhrf 
pacl<age (not exhiaustive). Classes are represented as rounded blue 
rectangles and external resources (file, computing units) as black rectangles. 



Note that the Xmllnitable class is duplicated for layout convenience. As in 
UML class diagrams, arrows have the following meaning: stands for an 
association, — 1> stands for a generalization. 
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from pyhrf .xmlio import Xitilini table , to_xitil 
import numpy as np 

class FmriTreatment (Xmllnitable) : 

def init (self, input_data=None, 

analYsis_paraineters=None) : 
Xmllnitable . init ( self ) 

data = { 'bold_file' : ' . /my_bold . nil ' , 

'paradigm' : np . array ( [ 0 , 2 . 3 , 6 . ] ) } 

analysis = { 'model' : 'JDE_MCMC', 
'mcmc_s amp ling' : { 

'HRF' : { 'duration' : 25, 
' type ' : 
'canonical' }}} 

treatment_xml = to_xml (FmriTreatment (data, 

analysis) ) 
f = open ('. /test . xml ', 'w' ) 
f . write ( treatment_xml ) 
f . close ( ) 



nie Edit Option 



II nm,^^^^^^^ 

□ 9? root 

B ^ FmriTreatment 
B ^ input data 
B ^ paradigm 

^ [g 0.0 2.3 6.0 
1=1 bold_file 

./my bold.nii 
^ analysis parameters 
B- " 



model 

gl JDE MCMC 
1=1 ^ mcmc sampling 
B^^ HRF 

B ^ duration 
L... [g 25 

1=1 type 

i fH canonical 



FIGURE 4 I Handling of XML parametrization. Tlie top part shows a 
code snippet that defines a dummy yet typical fMRI treatment structure 
with nested components. The init process of the resulting top-level object 
is then saved in an XML file. The bottom part is a snapshot of the 
PYhrf_xmledit main window where the XML file generated by the code 
snippet is browsed. 



3.2.2. The xzidarray class: data array with axis semantics 

The development of semantics-driven operations on data arrays 
was motivated by the parcel-driven nature of the analysis work- 
flow which implied that parcel-specific results have to be merged 
in a transparent fashion, whatever their shape. Indeed, as pyhrf 
is the repository of all the methodological tools developed within 
the JDE framework, the number and the form of outputs is 
highly changing during the development and testing process. This 
involves producing convergence tracking, intermediate quantities 
in addition to the final results of interest. To avoid writing a spe- 
cific saving procedure for such versatile and numerous outputs, 
the information about the interpretation of the data axes has to 
be explicit. The class xndarray handles any required reorien- 
tation prior to saving data arrays into nifti or gifti files. In the 
volume-based data case, the reorientation follows the nibabel 
convention that is sagittal, coronal, axial and time. To store the 
extra axis information along with the data, a dedicated nifti- 
extension is also written in the volume-based case or add a 
"pyhrf_xndarray_data" field in the gifti meta data dictionary in 
the surface-based case. 

Moreover, outputs are primarily generated at the parcel-level 
so that they are in a flat shape, i.e., the position axis represent 
indices of positions in the spatial domain. To form the final whole 
brain outputs, the parcel-specific outputs have to be merged 
together and the position axis, if present, has to be mapped into 
the final spatial domain. Table 2 shows two examples of parcel- 
specific outputs that are merged to form whole brain data either 
by spatial mapping or by parcel stacking. To handle these two 
merging operations, stack and merge functions are provided. 
The reverse process is also available via the method explode 
which allows an array to be split according to a mask com- 
posed of integers, i.e., a parcellation. It returns the dictionary of 
"flat" parcel-specific data arrays associated with each integer label 
present in the mask. 

In terms of data life cycle, xndarray objects are used to pre- 
pare data before analysis and to pack results after analysis. During 
the analysis process, it is more convenient to work with numpy 
arrays directly. The following code snippet illustrates the use of 
xndarray objects: functional and parcellation data are loaded, 
within-parcel means are computed and the result is saved to nifti: 

from pyhrf . ndarray import xndarray , merge 

# Data loading 

func_data = xndarray . load ('. /bold. nil ' ) 
parcellation = xndarray. load( '. /parcel latio 
n . nil ' ) 

# Split functional data into parcel- 
specific data 

parcel_fdata = func_data . explode 
(parcellation) 

# Fill parcel-specific data with spatial 
means 

parcel_means = diet ( (parcel_id, d.copy(). 
fill (d.mean( 'position' ) ) ) 

for parcel_id,d in par 
cel_f data . items ( ) ) 

# Merge parcel-specific means (map 
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'position' axis onto spatial axes) 
parcel_means = merge (parcel_means , parcellat 

ion, axis= 'position' ) 
# Save output 

parcel_means . save ( ' . /bold_parcel_ineans . nii ' ) 



by pushing the biggest parcels first. To optimize also, a safeguard 
is imposed on the maximum parcel size (more than 7 cm^ in the 
volume or 11 cm^ on the surface). If a parcel exceeds this limit, it 
is divided up according to the balanced partitioning presented in 
section 2.3.1.2. 



3.3. DISTRIBUTED COMPUTING 

PyHRF provides parallel processing features by exploiting local 
resources (multiple processors on a single workstation) as v^eU as 
remote parallel processing units such as a local grid network or 
a cluster. A whole brain JDE analysis then boUs down from 10 h 
to 15 min in parallel (on a 100-cores cluster). More precisely the 
available computing resources are handled as follows: 

• local multiple-cores CPUs: through the use of j obi ib paral- 
lel features. The latter works by spawning python sub-processes 
that are then run on the different processing units by the oper- 
ating system. The number of used CPUs can be setup by the 
user. 

• machines over a local area network: through in-house code 
that relies on par ami ko and hence uses ssh connections to 
distribute jobs on the LAN. A basic scheduler is implemented 
in pyhrf . grid that can also report faulty remote runs. 

• multi-core cluster: through soma-workf low developed 
by Laguitton et al. (2011), which relies on paramiko'^ on the 
cHent side and on DRMAA'^ on the server side. 

The distribution problem addressed here is a so-called embar- 
rassingly parallel problem where the same treatment has to be 
repeated on several parcel-specific pieces of data. There is no 
shared memory management between distributed processes here. 

To optimize the distribution process, the order in which the 
parcel-specific treatments are pushed in the process queue is done 



'http://brainvisa.info/soma-workflow/ 

http://www.lag.net/paramiko/ 

http://www.drmaa.org/ 



3.4. VIEWER 

pyhrf_view is a dedicated viewer built on PyQt4 which 
embeds a matplotlib view. The purpose of pyhrf_view is 
to provide convenient browsing of volume-based data'^. However, 
it does not provide advanced overlaying features such as the 
display of functional over anatomical data. Instead, to plot the 
final "publication-ready" maps after having selected the results 
of interest with pyhrf_view, one can resort to the command 
pyhrf_plot_slice to directly generate a slice image of functional 
rendering along with anatomical overlay. One can also use a third 
party viewer such as Anatomist^*, FSL_view'^ or xjview'*. 

pyhrf_view offers n-dimensional browsing while most 
viewers in neuro-imaging software handle up to 4D volumes. In 
fact, there is a limit to the number of dimensions inherent to 
the nifti format which permits 7 axes at maximum. The viewer 
is composed of two main components (see Figure 5): 

• a main window handling object and slice selection, 

• plot windows which display the selected slice as curve or 
image. 

The slice selection tools provides sliders to browse through axes 
domain values and display related information: axis name, cur- 
rent selected domain values and projection states. There can be 
up to two projected axes (2D), i.e., axes which will mapped to 
the actual plot axes. When multiple objects are loaded, slicers are 
synchronized to plotting views so that click events yield slider 
updates. This behavior can be modified in two ways. First, the 



Surface rendering is not available. Anatomist is recommended for such 
usage. 

^*http://brainvisa.info 

^^http://fsl. fmrib.ox.ac.uk/fsl/fslview/ 

^*http://www.alivelearn.net/xj views/ 



Table 2 | Examples of merging operations performed on multiple parcel-specific data arrays, for some JDE outputs: parcel-specific HRFs and 
condition- and voxel-specific activation labels. 



Parcel-specific flat data Whole brain data 
Merging operation 

Axis label Axis domain Axis label Axis domain 



HRF Time 


[0, . . ., hrf_duration] 






same 








U 


Parcel 


[0, ... 


, parceLmax] 


Labels Class 


['activ', 'non_activ'l 






same 




Condition 


['audio, 'video'] 






same 




Position 


[0, . . ., pos_maxl 


— > 


Axial 


[0, ... 


, axial_maxl 








Coronal 


[0, ... 


, coronaLmax] 








Sagittal 


[0, ... 


, sagittaLmax] 



If the xndarray object contains the "position" axis, as for the "labels" object, then all parcel-specific results are merged into the same target volume and we depict 
the spatial mapping operation as "-^ " to map the "position" axis in to the spatial axes "axial," "coronal," and "sagittal." For other axes aside from "position," no 
merging operation is performed ("= " symbol). If the xndarray object does not contain the "position" axis, as for the HRF object, then all parcel-specific results are 
stacked and a new "parcel" axis is created ("U " symbol). 
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FIGURE 5 I Main widget components of pyhrf_vlew to browse and 
view n-dimensional data. Left: Tine list widget on top displays the 
currently loaded objects. The slicer panel at the bottom allows: projection 
of axes (combo boxes on the left), domain value slicing (sliders in the 
middle) and definition of view synchronization (combo boxes on the right). 
For a given axis slicer, the two combo boxes defining synchronization are: 
(E) toggle emission of slice change to other slicers, (R) toggle reception 



from other slicers or from click events on plots. IVIiddle: Plot window for 
the current selected slice. The top part displays the actual plot as 
produced by matplolib.pylab. The bottom part offers changing the 
view mode (either curve, image, or histogram), and toggling display of 
axes, colorbar and mask. The color button pops up a gradient map selector 
if in image mode or a color picker if in curve mode. Right: Other plot 
window to illustrate curve display. 



reception combo box toggles whether the slider receives changes 
from other sliders. This is useful when one wants to prevent a 
given view from being updated by synchronization events (with 
reception off), e.g., when a reference slice should be compared to 
other slices. Second, the emission combo box toggles the spread- 
ing of slider changes to all other slicers. This is typically used to 
control a given axis across all displayed objects with a single slider 
(with emission on). 

4. RESULTS 

4.1. EXPERIMENTAL PARADIGM 

In all presented results, whether they focus on artificial or real 
data sets, we resorted to the same experimental paradigm. The lat- 
ter is a multi-functional cognitive localizer paradigm designed in 
Pinel et al. (2007). This paradigm enables the mapping of cogni- 
tive brain functions such as reading, language comprehension and 
mental calculations as well as primary sensory-motor functions. 
It consists of a fast event-related design (60 stimuli, ISI = 3.75 s) 
comprising the following experimental conditions: auditory and 
visual sentences, auditory and visual calculations, left/right audi- 
tory and visual clicks, horizontal and vertical checkerboards. 

4.2. ARTIFICIAL DATA GENERATOR 

Simulations in pyhrf mainly consist of building a script that 
defines a pipeline of versatile simulation bricks presented in 
Table 3. Writing a simulation script as a sequence of functions 
makes things difficult to read and to reuse. Instead, all simulation 
bricks are gathered inside a python dictionary that maps a simu- 
lation label to its corresponding value. This value can be directly 
defined as a python object or as a function which can depend on 
other simulation items and which is called when the simulation 



Table 3 | Different types of simulation bricks available in pyhrf. 

Simulation item Available generation process 

Experimental paradigm Localizer, random event-related 

Activation labels Hand-drawn 2D maps, 3D Potts 

realizations 

Response levels Bi/tri mixture of Gaussian or Gamma 

components 

Hemodynamic response function Canonical, Bezier curve, Gaussian 

smooth 

Low frequency drift Polynomial, cosine 

Noise White, auto-regressive of order p 

The "localizer" paradigm is described in Pinel et al. (2007). Hand-drawn maps 
for activation labels are in the form of png images. Gaussian smooth generation 
of HRFs stands for the regularized prior used in the JDE model. 

pipeline is evaluated. The pipeline structure arises from the link 
between simulation labels and function arguments. An example 
of such simulation script is given below: 

import numpy as np 

from pyhrf . ndarray import xndarray 
from pyhrf .tools import Pipeline 

# Functions used to generate items in the 

simulation Pipeline 
def generate_rls (spatial_shape, mean_rls, 

var_rls) : 

rls = np . random. randn ( *spatial_shape) * 
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var_rls**.5 + mean_rls 
return xndarray (rls , ['axial', 
'sagittal', 'coronal']) 

def generate_noise ( stim_induced_signal , 
noise_var) : 

noise = np . random. randn ( *stim_induced_ 

signal . data . shape) * noise_var* * . 5 
return xndarray . xndarray_like ( stiin_ 
induced_signal , data=noise) 

def create_stiin_induced_signal (rasteRed_ 
paradigm, hrf , response_levels ) : 

signal = np. convolve (rasteRed__paradigm, 
hrf) [np.newaxis, : ] * \ 

response_levels . data [:,:,:, np . newaxis] 
return xndarray (signal , response_levels . 
axes_names + [ ' time ' ] ) 

def create_bold (stim_induced_signal , noise): 
return stim_induced_signal + noise 

# Definition of the simulation pipeline 
simulation_steps = { 

' spatial_shape ' : (10,11,12), 'mean_rls' : 

3 . , ' var_rls ' : 0.5, 
' response_levels ' : generate_rls , 
' rasteRed_paradigm' : np . array ([ 0 , 0 , 1 , 0 , 0 , 

0,1,0,0,0,1]), 
'hrf : np.array( [0, . 5, 1, 0 . 5, 0 . , 0] ) , 
' noise_var ' : 1 . , 
'noise' : generate_noise , 
' stim_induced_signal ' : create_stim_ 

induced_signal , 
'bold' : create_bold, 
} 

simulation = Pipeline ( simulation_steps ) 

# Computation of all quantities in the 
pipeline and data saving 

simulation . resolve ( ) 

simulation_items = simulation . get_values ( ) 
simulation_items [ ' response_levels ' ] . 

save ( ' . /response_levels . nil ' ) 
simulation_items [ ' stim_induced_signal ' ] . 

save ( ' . /stim_induced_signal . nil ' ) 
simulation_items [ ' bold ' ] . save ( ' . /bold . nil ' ) 

The artificial data experiment presented here comprises the gen- 
eration of BOLD time series within the volume and then pro- 
jected onto the cortical surface. To do so, shipped data defines a 
volume of 4 HRF territories, as well as the gray/white matter seg- 
mentation obtained from real data in the occipital region. Within 
the gray matter mask, activation labels are generated and con- 
ditionally to them, response levels are simulated according to a 
bi-Gaussian mixture. For the sake of simplicity, a version of the 



localizer paradigm presented in the previous section is merged 
over the auditory and visual modalities so as to obtain only two 
conditions. In all HRF territories this paradigm is then convolved 
with HRF generated by Bezier curves that enable the control of 
the time-to-peak and time-to-undershoot. Finally, nuisance sig- 
nals are added (Gaussian noise and polynomial drift) to obtain 
the volume of artificial BOLD data. To generate surface-based 
data, data are projected onto a cortical fold that is also shipped 
in the package and we resorted to an external projection tool, 
developed in Operto et al. (2006) but others are available such 
as Freesurf er. Figure 6 presents the results obtained on arti- 
ficial data using the JDE procedure. HRF estimates recover their 
respective ground truth profiles with a slightly more deformed 
curve obtained on the cortical surface for the bottom right (green) 
HRF territory, compared with the volume-based case. Detection 
results (response levels maps in Figure 6) also shows the correct 
recovery of the simulated ground-truth, in the volume and on the 
cortical surface. 

4.3. WITHIN-SUBJECT METHOD COMPARISON 

The analyzed real data set, which is shipped with pyhrf , was 
a subset of an fMRI acquisition performed on a single healthy 
subject with a 3-Tesla Tim Trio Siemens scanner using an EPI 
sequence. The following settings were used for this acquisi- 
tion: the fMRI session consisted of N = 128 scans, each of 
them being acquired using TR = 2400 ms, TE = 30 ms, slice 
thickness: 3 mm, FOV = 192 mm^ and spatial in-plane resolu- 
tion of 2 X 2 mm^ . In order to reduce disk usage and to focus 
only on areas of the brain which are expected to elicit activ- 
ity in response to the paradigm, functional data was restricted 
to selected regions of interest that comprise occipital, tem- 
poral, parietal and motor regions. To improve interpretation 
and data plot rendering, an anatomical image is also shipped, 
with an in-plane resolution of 1 x 1 mm^ and slice thickness 
of 1.1 mm. 

This fMRI data set was analyzed using GLM with a canonical 
HRF, FIR, RFIR and JDE^'. For JDE, the functional parcellation 
was built according to the method described in section 2.3.2. 
Figures 7A,B depicts detection results for the auditory effect, 
obtained by GLM with canonical HRF (see Figure 7A) and 
JDE (see Figure 7B). Both methods highlight the same activa- 
tion localization, with a slightly stronger sensitivity for JDE. 
Figure 7C shows HRF estimation results as obtained by FIR, 
RFIR and JDE at the same local maximum on the left temporal 
region. Note that the HRF estimate provided by the JDE pro- 
cedure is regional. The HRF profile delivered by FIR appears 
noisier than the JDE and RFIR counterparts. Also the tempo- 
ral resolution of FIR is limited to the TR of input data. In 
contrast, RFIR and JDE offer an enhanced temporal resolution 
of 0.6 s In terms of timing, the FIR and JDE methods yield a 
peak at 5 s which is compatible with the canonical HRF that 
has been fitted on temporal auditory regions (Boynton et al., 
1996). Accordingly, the HRF estimates obtained by RFIR seem 
over-smoothed. Overall, JDE enables reliable activation maps and 



analysis scripts are available at http://github.com/pyhrf/pyhrf/tree/master/ 
script/frontiersBI]VI14/ 
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FIGURE 6 I Results on volume-based and surface-based artificial data. 
Left part: HRF estimates obtained by JDE on the four artificial parcels. 
Ground truth HRFs are depicted in black line while colored HRF are HRF 
estinnates that match the color of the parcels. Right part, top: labels 



simulated in the cortical fold for two conditions (in blue and red). Rigfit 
part, bottom: Response levels estimates obtained by JDE on the cortical 
surface and in a selected slice of the volume. 3D renderings were 
produced with anatomist. 



HRF profiles that can roughly be obtained by separate GLM 
and FIR analyses. Figures 7D,E shows results on effect maps for 
the computation effect, obtained by GLM with canonical HRF 
(see Figure 7D) and JDE (see Figure 7E). JDE results have a 
higher sensitivity which can be explained by an estimated HRF 
that differs from the canonical version (see Figure 7F). More 
specifically on the HRF estimation results shown in Figure 7F, 
we can make the same comments as for the auditory results. 
However, the FIR HRF profile is here more chaotic and its peak 
is less easy to identify as the curve shows a plateau between 
7 and 10 s 

4.4. GROUP-LEVEL HEMODYNAMICS 

Using pyhr f , the hemodynamic variability was also studied on a 
group of 15 healthy volunteers (average: 23.2 years, std: 2 years). 
The experimental paradigm is described in section 4.1 and the 
fMRI acquisition parameters are similar to those previously men- 
tioned in section 4.3. The results presented hereafter have been 
published in Badillo et al. (2013b). In this work, hemodynamic 
variability was investigated in four regions of interest, located 
in the left parietal cortex (P), bilateral temporal (T) and occip- 
ital (O) lobes and in the right motor cortex (M), as shown in 
Figure 8. These regions were defined after conducting a random- 
effect analysis to detect activation clusters showing a significant 
group-level effect. More precisely, we defined four contrasts of 
interest targeting brain activity in sensory and cognitive regions: 
a Auditory vs. Visual contrast for which we expect evoked activ- 
ity in temporal regions in response, a Visual vs. Auditory contrast 
that induces evoked activity in the occipital cortex, a Left vs. 
Right click contrast for which we expect evoked activity in the 
right contralateral motor cortex, and a Computation vs. Sentence 



contrast which is expected to highlight activity in the frontal 
and parietal lobes specific to mental calculations. In terms of 
detection performance, at the group-level, JDE and GLM are 
comparable in primary sensory regions (where the canonical 
HRF is appropriate). However, in the parietal region involved in 
higher cognitive processes, the JDE approach yields more sensitive 
maps. In what follows, we summarize group-level hemodynamics 
results obtained in the regions of interest extracted from activated 
clusters. 

The group-level HRF extraction in each ROI involves the fol- 
lowing steps: For each subject, we identify the parcel containing 
the most activated voxel across stimulus-dependent response lev- 
els. Each individual parcel-based HRF time course is then scaled 
by the corresponding maximum response level so as to account 
for the inter-subject variability of the effect size. Last, each group- 
level HRF profile (see Figure 8) is computed as the average over 
the 15 subjects in the corresponding ROI. 

One of the main results concerns the spatial gradient of dis- 
crepancy to the canonical HRF shape between regions. As shown 
in Figure 8, the mean HRF time courses retrieved in occipital and 
temporal regions are the closest to the canonical shape h^. In the 
motor cortex, the HRF deviates a little more from the canoni- 
cal filter, especially in terms of hemodynamic delay. Finally, the 
largest discrepancy to the canonical HRF was found in the parietal 
region. 

5. PERSPECTIVES 

5.1. METHODOLOGICAL PERSPECTIVES 

The main methodological developments are currently tak- 
ing place in the JDE framework. In fMRI activation proto- 
cols, the paradigm usually consists of several runs repeating 
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FIGURE 7 1 Detection and estimation results on the shipped real 
data set. Top and bottom rows: Auditory and computation 
experimental conditions, respectively. Columns from left to right; 
Response level maps, for (A,D) GLM with canonical HRF; (B,E) JDE, 



superimposed with the functional parcellation (white borders). 
Neurological convention: left is left. (C,F): Estimation results for GLM 
FIR (blue), RFIR (green), and JDE (red). The canonical HRF is shown 
in black. 




FIGURE 8 I Left: Definition of regions of interest to investigate 
hemodynamics variability from JDE-based group-level analysis. Top: Sagittal 
view. Bottom: Axial/top view. Left parietal area (P) appears in red, left 
motor area in the pre-central cortex is shown in • , Bilateral temporal 
regions along auditory cortices and bilateral occipital regions in the visual 
cortices are shown in blue and cyan, respectively. Right: Group-average 
HRF estimates for the four regions of interest: hp, Hm, hj, ho stand for 
HRF means in parietal, motor, temporal and occipital regions, respectively. 
he correspond to the canonical HRF 



similar sequences of stimuli. For an increased stability of HRF 
estimates that cope with the between-run variability of the 
response magnitude, a hierarchical multi-run extension with 
heteroscedastic noise has been developed in Badillo et al. (2013c). 



It is particularly useful for pediatric imaging where runs are short 
in time. In the same vein of improving within-subject analyses, an 
approach to encode the condition-specificity at the parcel level is 
being developed to enforce non-relevant conditions to yield null 
activation, as in Bakhous et al. (2013). 

The variational EM version of JDE that has been published 
in Chaari et al. (2013) and that appeared to be 10-30 times faster 
than its MCMC alternative, has allowed us to address (Chaari 
et al., 2012) the additional task of estimating the spatial aggre- 
gation support of HRF shapes (parcellation), which is sup- 
posed given a priori in the current JDE approach. The so-called 
joint Parcellation-Detection-Estimation (JPDE) validation is still 
ongoing. In an attempt to solve the same issue, an alternative 
based on random parcellations and consensus clustering has been 
recently proposed in Badillo et al. (2013a). 

Closely related to the results presented in section 4.4, a multi- 
subject extension of the JDE is currently developed to properly 
account for the between-subject HRF variability and recover a 
meaningful and potentially less biased group-level HRF pro- 
file. This development trail wiU bring modification in the core 
design of pyhr f so as to take into account the new "group" data 
axis. 

Finally, recent work has opened the path to multi-modality 
by processing Arterial Spin Labeling fMRI data (Vincent et al., 
2013). To analyze such data, physiologically-inspired models are 
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investigated to establish parsimonious and tractable versions of 
physiological models such as the balloon model described in 
Friston and Buechel (2000) and Buxton et al. (2004). Hence, 
for validation purpose, the artificial data generator is also being 
enriched with the simulation of physiological models. 

5.2. PACKAGE PERSPECTIVES 

In addition to improving the documentation and usability of the 
current package version, additional developments will be first 
motivated by the above-mentioned methodological perspectives, 
namely re-factoring part of the data design to integrate the group- 
level and multi-session data components. This will mainly involve 
the modification of the FmriData class and the addition of a 
new FmriGroupData class. The handling of data input will 
have to be extended to exploit a hierarchy of subject-specific files. 

We also plan to enrich the parcellation component by 
handling classical atlases such as the Automated Anatomical 
Labeling (AAL) atlas buUt by Tzourio-Mazoyer et al. (2002), the 
Brodmann regions (Brodmann, 1909) and the Harvard-Oxford 
atlas (Desikan et al, 2006) available in FSL The goal is to 
enable the definition of functional parcels that are consistent 
with previous studies in the literature and also to further inves- 
tigate the anatomo-functional link by comparing atlas-driven vs. 
data-driven parceUations. 

In order to offer more user-friendliness, the construction of 
a unified graphical user interface is foreseen, which wOl gather 
together the XML editor and the viewer while also enabling the 
selection of the analysis type. We also envisage resorting to wiz- 
ard interfaces to guide the setup process and deliver contextual 
documentation. In terms of browsing features, tools to prop- 
erly explore the surface-based results are currently missing, as we 
resort to an external tool, anatomist. The goal is not to repro- 
duce all the features offered by the latter which enable the output 
of paper-ready figures through joint volume/surface rendering, 
data fusion and material handling. We rather think of a simple 
textured mesh viewer associated with a picking feature in order to 
synchronize other views. The main usage is to make the selection 
of a mesh node and the corresponding HRF estimate feasible. For 
making this surface-based rendering available, mayavi " is an 
appealing candidate since it has been already intensively used in 
the python community. 

Finally, we plan on incorporating GPU parallel computing 
features. This technology is becoming more and more available 
and powerful and may also appear cheaper than CPU comput- 
ing systems (see Owens et al, 2008 for a review). Specifically, the 
NVIDIA chipsets are easily accessible for general purpose com- 
puting through the python package pyCUDA^°. A simple test on 
matrix products with a complexity similar to that of our models 
showed a gain of one order of magnitude in favor of GPU com- 
putations^^ (NVIDIA GeForce 435M graphics card) compared to 
CPU-based computations (Intel Core M480 @ 2.67 GHz) with 
numpy. 



^^http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/ 
^'http://code. enthought.com/projects/mayavi/ 
^"httpV/developer.nvidia.com/pycuda 

^^benchmark available at http://wiki.tiker.net/PyCuda/examples/ 
DemoMetaMatrixmulCheetah 



6. CONCLUSION 

The pyhrf package provides tools to detect evoked brain activ- 
ity and estimate the underlying dynamics from fMRI data in the 
context of event-related designs. Several "reference" methods are 
available: the GLM, FIR and RFIR approaches, and also more 
flexible models as provided by the JDE framework. The choice 
of the analysis tools depends on the experimenter's goal: if sim- 
ple mappings are required, the GLM is appropriate provided that 
the HRF is expected to be close to its canonical version, but for 
finer dynamic estimation, the JDE procedure is more suitable. 
The design of pyhr f allows the handling of volume-based and 
surface-based data formats and also the utilization of several dis- 
tributed computing resources. The main user interface is done by 
shell commands where the analysis setup is stored in an XML con- 
figuration file. Two graphical components are provided: an XML 
editor and a «-dimensional volume-based data browser. 

This package provides valuable insights on the dynamics of the 
cognitive processes that are not available in classical software such 
as SPM or FSL. Hence, it offers interesting perspectives to under- 
stand the differences in the neuro-vascular coupling of different 
populations (infants, children, adults, patients, etc.). 
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